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Description 

[0001] The present invention relates to maintaining 
parity information on computer data storage devices 
and in particular to maintaining availability of a computer 
system when reconstructing data from a failed storage 
device. 

[0002] The extensive data storage needs of modern 
computer systems require large capacity mass data 
storage devices. A common storage device is the mag- 
netic disk drive, a complex piece of machinery contain- 
ing many parts which are susceptible to failure. A typical 
computer system will contain several such units. As us- 
ers increase their need for data storage, systems are 
configured with larger numbers of storage units. The fail- 
ure of a single storage unit can be a very disruptive event 
for the system. Many systems are unable to operate until 
the defective unit is repaired or replaced, and the lost 
data restored. An increased number of storage units in- 
creases the probability that any one unit will fail, leading 
to system failure. At the same time, computer users are 
relying more and more on the consistent availability of 
their systems. It therefore becomes essential to find im- 
proved methods of reconstructing data contained on a 
failing storage unit, and sustaining system operations in 
the presence of a storage unit failure. 
[0003] One method of addressing these problems is 
known as "mirroring". This method involves maintaining 
a duplicate set of storage devices, which contains the 
same data as the original. The duplicate set is available 
to assume the task of providing data to the system 
should any unit in the original set fail. Although very ef- 
fective, this is a very expensive method of resolving with 
the problem since a customer must pay for twice as 
many storage devices. 

[0004] A less expensive alternative is the use of parity 
blocks. Parity blocks are records formed from the Ex- 
clusive-OR of all data records stored at a particular lo- 
cation on different storage units. In other words, each 
bit in a block of data at a particular location on a storage 
unit is Exclusive-ORed with every other bit at that same 
location in each storage unit in a group of units to pro- 
duce a block of parity bits; the parity block is then stored 
at the same location on another storage unit. If any stor- 
age unit in the group fails, the data contained at any lo- 
cation on the failing unit can be regenerated by taking 
the Exclusive-OR of the data blocks at the same location 
on the remaining devices and their corresponding parity 
block. 

[0005] U.S. Pat. No. 4,092,732 (Ouchi et al.) de- 
scribes a parity block method. In said patent, a single 
storage unit is used to store parity information for a 
group of storage devices. A read and a write on the stor- 
age unit containing parity blocks occurs each time a 
record is changed on any of the storage units in the 
group covered by the parity record. Thus, the storage 
unit with the parity records becomes a bottleneck to stor- 
age operations. EP-A-0 249 091 (Clark et al.) improves 
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upon storage of parity information by distributing parity 
blocks substantially equally among a set of storage 
units. N storage units in a set are divided into a multiple 
of equally sized address blocks, each containing a plu- 
5 rality of records. Blocks from each storage unit having 
the same address ranges form a stripe of blocks. Each 
stripe has a block on one storage device containing par- 
ity for the remaining blocks of the stripe. The parity 
blocks for different stripes are distributed among the dif- 
10 ferent storage units in a round robin manner. 

[0006] The use of parity records as described in the 
Ouchi and Clark patents substantially reduces the cost 
of protecting data when compared to mirroring. Howev- 
er, while Ouchi and Clark teach a data recovery or pro- 
is tection means, they do not provide a means to keep a 
system operational to a user during data reconstruction. 
Normal operations are interrupted while a memory con- 
troller is powered down to permit a repair or replacement 
of the failed storage device, followed by a reconstruction 
20 of the data. Since this prior art relies exclusively on soft- 
ware for data reconstruction, the system can be disa- 
bled for a considerable time. 

[0007] Prior art does not teach dynamic system recov- 
ery and continued operation without the use of duplicate 

2S or standby storage units. Mirroring requires a doubling 
of the number of storage units. A less extreme approach 
is the use of one or more standby units, i.e., additional 
spare disk drives which can be brought on line in the 
event any unit in the original set fails. Although this does 

30 not entail the cost of a fully mirrored system, it still re- 
quires additional storage units which otherwise serve no 
useful function. 

[0008] An article of Randy H. Katz 'A project on High 
Performance I/O Subsystems' in '88345 Computer Ar- 

3$ chitecture News (September 1989) discloses the Disk 
Arrays providing High reliability. The array of disks is or- 
ganized into separate groups for recovery purpose. 
High reliability is reached at very low cost by just adding 
one extra disk per group to hold the parity calculated 

40 horizontally across the group. Redundancy, spread of 
data and parity bits over the storage units are the char- 
acteristics of the N+1 RAID solution discussed in the ar- 
ticle 'A case for Redundant Arrays of Inexpensive Disks' 
of ACM SIGMO Conference Chicago, Illinois, June 1-3, 

4S 1988. N+1 RAID is also the level 5 RAID as exposed in 
a previous article of the same authors. In the article the 
mirrored RAID and the N+1 RAID solutions are com- 
pared in terms of cost and performance. The parity of 
one group of disks is calculated on a per-bit basis any 

so single disk failure can be corrected by reading the rest 
of the disks in the group to determine what bit value on 
the failed disk would give the proper parity. A bottleneck 
of the parity disk is avoided by spreading the parity over 
several disks (page 113 5. N+1 RAID paragraph). Even 

ss jf the articles provides a solution with low cost disks and 
an interesting ratio cost/performance (in terms of relia- 
bility and data access time), none of these documents 
deals with the problems of dynamic reconstruction of da- 
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ta from a failure; these problems concern the storage 
space used for reconstruction of data and the availability 
of the disk array for the computer system using it. 
[0009] It is therefore an object of the present invention 
to provide an enhanced method and apparatus for re- s 
covering from data loss in a computer system having 
multiple data storage units. 

[0010] It is a further object of this invention to provide 
an enhanced method and apparatus whereby a compu- 
ter system having multiple data storage units may con- 10 
tinue to operate if one of the data storage units fails. 
[0011] Another object of this invention is to reduce the 
cost of protecting data in a data processing system hav- 
ing multiple protected storage units. 
[0012] A still further object of this invention is to in- 1& 
crease the performance of a computer system having 
multiple data storage units when one of the data storage 
units fails and the system must reconstruct the data con- 
tained on the failed unit. 

[001 3] A storage controller services a plurality of data 20 
storage units. A storage management mechanism res- 
ident on the controller maintains parity records on the 
storage units it services. Data and parity blocks are or- 
ganized as described in the patent to Clark et al. In the 
event of a storage unit failure, the system continues to 2s 
operate. The storage management mechanism recon- 
structs data that was on the failed unit as attempts are 
made to access that data, and stores it in the parity block 
areas of the remaining storage units. 
[0014] The storage management mechanism in- 30 
eludes a status map indicating, for each data block, the 
location of the corresponding parity block, and the status 
of the data block. If a storage unit fails, the storage man- 
agement mechanism is placed in a failure operating 
mode. While in failure operating mode, the storage man- 35 
agement mechanism checks the status map before ac- 
cessing data on the failed storage unit. If the data has 
not yet been reconstructed, storage management must 
first reconstruct the data in that block of storage by suc- 
cessively reading and accumulating an Exclusive-OR 40 
(XOR) of the same blocks on all storage units in the par- 
ity group, including the parity block. The block of data 
resulting from this Exclusive-OR is the reconstructed 
data, which is then stored in the location of the parity 
block. The status map is then updated to indicate that *s 
the block has been reconstructed. Once the data has 
been reconstructed, it is only necessary to read from or 
write to the former parity block directly. In the same man- 
ner, storage management will reconstruct the data from 
a block of storage on the failed unit before writing to any so 
other block on the same stripe (on a non -failed unit). 
This is required because the write operation to any block 
on the stripe will alter parity, making it impossible to later 
reconstruct the block of data on the failed unit. Thus, 
upon failure of a storage unit, system performance is 55 
initially degraded as read and write operations cause 
storage management to reconstruct data. As data is re- 
built, performance quickly improves. 
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[0015] In the preferred embodiment, the storage units 
are organized and parity information is generated and 
stored as described in the Clark et al. patent. Recon- 
structed data is stored in locations where parity data is 
normally stored for the stripe on which the lost data re- 
sided. There is no need to power down the storage con- 
troller or any other part of the system, repair the failed 
storage unit, and then reconstruct the lost data. In this 
preferred embodiment, the data are recovered and 
stored while a computer system using this storage man- 
agement mechanism remains completely available to a 
user. The storage units operate without parity protection 
until the failed unit is repaired or replaced. This embod- 
iment achieves continuous operation and single-level 
failure protection at very little additional cost. 
[0016] In a first alternate embodiment, spare areas of 
storage in each non-failing storage unit are allocated to 
the reconstructed data. The total of these spare areas 
constitute a virtual spare storage unit. As data is recon- 
structed, it is placed in the virtual spare unit, and parity 
is maintained in the normal fashion. This alternative 
achieves an additional level of failure protection, be- 
cause parity data continues to be maintained after a sin- 
gle storage unit failure. However, it may impose a need 
for additional storage space for the spare areas, or 
cause degraded performance if these spare areas are 
normally used for other purposes, such as temporary 
data storage. 

[0017] In a second alternate embodiment, the storage 
management mechanism resides in the host system's 
operating software, but otherwise performs the same 
functions as a storage management mechanism resid- 
ing on a storage controller. This embodiment will gener- 
ally be slower than the preferred embodiment, but may 
reduce the cost of the storage controller. 

Fig. 1 is a block diagram of a system incorporating 
the components of the preferred embodiment of this 
invention; 

Fig. 2 is a diagram of a status map; 

Fig. 3 is a flow diagram of the steps involved in a 
read operation during normal operating mode; 

Fig. 4 is a flow diagram of the steps involved in 
transferring data to be written from the host to the 
storage controller; 

Fig. 5 is a flow diagram of the steps involved in writ- 
ing data to a storage device in normal operating 
mode; 

Fig. 6 is a flow diagram of steps involved in read 
operations following a storage device failure. 

Fig. 7 is a flow diagram of the steps involved in writ- 
ing data to a storage device when a storage device 
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has failed; 

Fig. 8 is a block diagram of a system incorporating 
the components according to an alternative embod- 
iment ot this invention. 

[0018] A block diagram of the major components of 
computer system 100 of the preferred embodiment of 
the present invention is shown in Figure 1. A host sys- 
tem 101, communicates over a bus 102 with a storage 
controller 103. Controller 103 comprises a programmed 
processor 104, non-volatile RAM 105, Exclusive-OR 
hardware 108, and cache memory (RAM) 109. Non-vol- 
atile RAM 105 contains a status map 106 and table of 
contents 107. Controller 103 controls the operation of 
storage units 121-124. In the preferred embodiment, 
units 121-124 are rotating magnetic disk storage units. 
While four storage units are shown in Fig. 1 , it should 
be understood that the actual number of units attached 
to controller 1 03 is variable. It should also be understood 
that more than one controller 103 may be attached to 
host system 101 . In the preferred embodiment, compu- 
ter system 100 is an IBM AS/400 computer system, al- 
though any computer system could be used. 
[0019] The storage area of each storage unit is divid- 
ed into blocks 131-138. In the preferred embodiment, 
all storage units have identical storage capacity, and all 
parity protected blocks the same size. While it would be 
possible to employ this invention in configurations of 
varying sized storage units or varying sized blocks, the 
preferred embodiment simplifies the control mecha- 
nism. 

[0020] The set of all blocks located at the same loca- 
tion on the several storage units constitutes a stripe. In 
Fig. 1, storage blocks 131-134 constitute a first stripe, 
and blocks 135-138 constitute a second stripe. One of 
the blocks in each stripe is designated the parity block. 
Parity blocks 131,136 are shown shaded in Fig. 1. The 
remaining unshaded blocks 132-135,137-138 are data 
storage blocks for storing data. The parity block for the 
first stripe, consisting of blocks 131-134, is block 131. 
The parity block contains the Exclusive-OR of data in 
the remaining blocks on the same stripe. 
[0021] In the preferred embodiment, parity blocks are 
distributed across the different storage units in a round 
robin manner, as shown in Fig. 1. Because with every 
write operation the system must not only update the 
block containing the data written to, but also the parity 
block for the same stripe, parity blocks are usually mod- 
ified more frequently than data blocks. Distributing parity 
blocks among different storage units will in most cases 
improve performance by distributing the access work- 
load. However, such distribution is not necessary to 
practicing this invention, and in an alternate embodi- 
ment it would be possible to place ail parity blocks on a 
single storage unit. 

[0022] In the preferred embodiment, one block of eve- 
ry stripe is dedicated to parity information. As an alter- 
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native embodiment, one of the stripes contains no parity 
protection. This stripe is reserved for temporary data 
which does not require protection. Fig. 8 shows this al- 
ternate embodiment in the stripe consisting of blocks 
5 811-814. Because it is extra storage space not a part of 
the parity data protection scheme, this block may be of 
any arbitrary size. 

[0023] The allocation of storage area into stripes as 
described above, each containing blocks of data and a 
10 parity block, is the same as that described in U.S. Patent 
4,761,785 to Clark, et al., which is incorporated by ref- 
erence. 

[0024] Storage controller 103 includes programmed 
processor 104 executing a storage management pro- 

is gram. The operation of the storage management pro- 
gram is described below. Controller 103 also includes 
hardware Exclusive-OR circuitry 108, for computing the 
Exclusive-OR of data in non-volatile RAM 105 or cache 
RAM 109. In an alternative embodiment, the Exclusive- 

20 OR operations could be performed by processor 104, 
but special hardware for this purpose will improve per- 
formance. 

[0025] Non-volatile RAM 1 05 is used by controller 1 03 
as a temporary queueing area for data waiting to be 

25 physically written to a storage unit. In addition to this 
temporary data, status map 106 and table of contents 
107 are stored in non-volatile RAM 105. Table of con- 
tents 107 contains a mapping of the data waiting to be 
written to the location on which it is stored in the storage 

30 unit. 

[0026] Status map 106 is used to identify the location 
of the corresponding parity block for each data block, 
and the status of each block of data during failure recov- 
ery mode. Status map 106 is shown in detail in Fig. 2. 

35 It contains a separate table of status map entries for 
each storage unit. Each status map entry 201 contains 
the location 202 of a block of data on the storage unit, 
a status bit 203 indicating whether or not the data needs 
to be recovered when operating in failure mode, and the 

40 location of the corresponding parity block 204. 

[0027] Referring again to Fig. 1, cache memory 109 
is a volatile random access memory that is used to store 
data read from a storage unit. It serves as a buffer when 
transferring data from a storage unit to host system 101 

45 in a read operation. In addition, data is saved in cache 
1 09 in response to indications from the host system 1 01 
that the data has a high probability of modification and 
re-writing. Because unmodified data must be exclusive- 
ORed with modified data to update the corresponding 

50 parity data, saving read data in cache 1 09 can eliminate 
the need to read it again immediately before a write op- 
eration. Cache 1 09 exists only to improve performance. 
In an alternative embodiment, it would be possible to 
practice this invention without it. Cache 109 is identified 

55 as a volatile RAM because it is not necessary to the in- 
tegrity of the system that data read from storage be pre- 
served in non-volatile memory. However, the cache 
could be implemented as part of the non-volatile mem- 
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ory 105. Depending on the relative cost and size of 
memory modules, such an approach may be desirable. 
[0028] The function of the system in conjunction with 
the hardware and software features necessary to this 
invention is described below. The system has two oper- 
ating modes: normal and failure mode. The system op- 
erates in normal mode when all disk storage devices are 
functioning properly When one storage device fails, the 
mode of operation changes to failure mode, but the sys- 
tem continues to operate. 

[0029] A READ operation in normal mode is shown in 
Fig. 3. The READ operation is performed by accepting 
a READ command from the host at step 301 , and deter- 
mining whether the data requested exists in non-volatile 
RAM 105 or cache 109 at step 302. If so, the data in 
non-volatile RAM or cache is sent directly to the host at 
step 304. Otherwise, data is first read from the appro- 
priate storage unit into the cache 109 at step 303, and 
from there transferred to the host system at step 304. 
The cache 109 also improves performance during 
WRITE operations. If the original version of data to be 
updated is already in cache 109 when a WRITE opera- 
tion is processed, it is not necessary to read the data 
again in order to update parity, thus improving system 
performance. The contents of cache 109 are managed 
using any of various cache management techniques 
known in the art. 

[0030] A WRITE operation is performed by two asyn- 
chronous tasks running in the storage controller's proc- 
essor 1 04. One task communicates with the host via bus 
102, and is shown in Fig. 4. The WRITE operation be- 
gins when it accepts a WRITE command from the host 
at step 401 . It then checks table of contents 1 07 to de- 
termine whether sufficient space is available in non-vol- 
atile RAM 105 to store the data to be written to storage 
in step 402 (Note that space available includes space 
used by back-level versions of the data to be written, as 
well as unused space). If space is not available, control- 
ler 103 can not receive data from the host, and must 
wait for space to become available at step 403 (i.e., it 
must wait for data already in non-volatile RAM 105 to 
be written to storage 121-124). When space becomes 
available in non-volatile RAM 105, data is copied from 
host 101 into non-volatile RAM 105, and table of con- 
tents 107 is updated at step 404. Processor 104 then 
issues an operation complete message to the host at 
step 405. Upon receipt of the operation complete mes- 
sage, the host is free to continue processing as if the 
data were actually written to storage 121-124, although 
in fact the data may wait awhile in non-volatile RAM 1 05. 
From the host's perspective, the operation will appear 
to be complete. 

[0031] The second asynchronous task writes data 
from non-volatile RAM 105 to a storage unit. A flow di- 
agram of this task in normal mode is shown in Fig. 5. 
The task selects a WRITE operation from among those 
queued in non-volatile RAM at step 501. The selection 
criteria are not a part of this invention, and could be, e. 
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g. p First-in-first-out, Last- in-first-out, or some other cri- 
teria based on system performance and other consider- 
ations. When the WRITE operation is performed, parity 
must be updated. By taking the Exclusive-OR of the new 

5 write data with the old data, it is possible to obtain a bit 
map of those bits being changed by the WRITE opera- 
tion. Exclusive-ORing this bit map with the existing par- 
ity data produces the updated parity data. Therefore, be- 
fore writing to storage, the task first checks whether the 

10 old data exists in the cache 109 in unmodified form at 
step 502. If not, it is read into the cache from storage at 
step 503. This old data in the cache is then Exclusive- 
ORed with the new data in non-volatile RAM to produce 
the bit map of changed data at step 504. The bit map is 

75 saved temporarily in non-volatile RAM 105 while the 
new data is written to one of the storage devices 
1 21 -1 24. The old parity data is then read into the cache 
(if not already there) at steps 506,507, and Exclusive- 
ORed with the bit map to produce the new parity data 

20 at step 508. This new parity data is written to one of the 
storage devices 1 21 -1 24 and the table of contents is up- 
dated at step 509, completing the WRITE operation. 
[0032] When a storage unit failure is detected, the 
system begins operating in failure mode. The failure of 

25 a storage unit means failure to function, i.e., to access 
data. Such a failure is not necessarily caused by a 
breakdown of the unit itself. For example, the unit could 
be powered off, or a data cable may be disconnected. 
From the perspective of the system, any such failure, 

30 whatever the cause, is a failure of the storage unit. De- 
tection mechanisms which detect such failures are 
known in the art. Common mechanisms include a time- 
out after not receiving a response, and continued high 
error rates in received data. 

55 [0033] Figure 6 illustrates the READ operation when 
the system is operating in failure mode. As in the case 
of normal mode READ operations, when A READ is ac- 
cepted from the host at step 601 , the controller first 
checks its non-volatile RAM 105 and its volatile cache 

40 109 for the desired data at step 602. If the data exists 
in non-volatile RAM or cache, the data is transferred to 
the host via system bus 102. If the data is not in non- 
volatile RAM or cache, and resides on a storage device 
which has not failed (step 603), the data is read into the 

45 cache from the storage device in the normal manner at 
step 604. If the data resides on a failed storage unit, the 
controller checks the status map entry 201 in status map 
106 for the location in storage of the desired data at step 
605. The status map entry will indicate whether the data 

50 has been recovered, i.e., whether it has been recon- 
structed by exclusive-ORing and stored at some alter- 
nate location. If the status map indicates that the data 
has not been recovered (step 605) the controller suc- 
cessively reads the corresponding locations on all stor- 

55 age units except the failing one at step 608. Each block 
of data read is XORed by the XOR hardware 108 with 
the accumulated XOR results of the previously read 
blocks. The final XOR results constitute the reconstruct- 
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ed data of the failed device. This reconstructed data is 
written to the parity block corresponding to this block of 
data at step 609. The location of this block is stored in 
a parity block address field 204 of the status map 108. 
After writing the recovered data to the parity block loca- 
tion, status map 1 08 is updated at step 61 0 by changing 
the status bit 203 of each block in the same stripe to a 
*1 ' to indicate that the data has been recovered. The re- 
constructed data is sent to the host at step 611. If the 
status bit 203 originally contained a T, indicating that 
data had been recovered, the controller would obtain the 
location of the former parity block area (where recov- 
ered data is stored) from the status map at step 606, 
and read the data from this location directly into the 
cache at step 607. By this device, it is only necessary 
to read all disk storage units once to recover any partic- 
ular block of data. Once recovered, the physical storage 
location of that data is effectively relocated to the loca- 
tion that was formerly used for parity storage, and any 
subsequent reads of that block need only read the one 
storage unit. 

[0034] Figure 7 illustrates the write to storage opera- 
tion when the system is operating in failure mode. As 
with the normal mode WRITE, a host communications 
task shown in Fig. 4 receives data to be written from the 
host via bus 102. The write to storage task selects a 
write operation from the queue in non-volatile RAM 105 
at step 701 . The controller determines whether the data 
is to be written to a failed unit (step 702) and checks the 
status map (steps 703, 709). If the data is to be written 
to a failing unit, and the data in the block has not yet 
been recovered, the block must be recovered before 
any write operations are possible. Recovery follows the 
same steps described above for a READ operation. 
Each block in the same stripe of blocks (including the 
parity block) is read in turn, and its contents Exclusive- 
ORed with the cumulative Exclusive-OR of the previous- 
ly read blocks at step 704. The result, which is the re- 
constructed data, is written to the location used for the 
parity block at step 705. Once the recovery of the entire 
block is complete, the new data (which would typically 
encompass only a portion of the block) is written over 
the recovered data in the former parity location at step 
706, and the status map updated to indicate that the 
block has been recovered at step 707. If data is to be 
written to a failing unit, but the data has already been 
recovered, it is written directly to the former parity loca- 
tion, now used for storage of recovered data, at step 
708. 

[0035] If data is being written to a non-failing unit 
when operating in failure mode, the controller checks 
the status map at step 709. If the status is '1', indicating 
that the block of data in the same stripe on the failing 
unit has already been recovered, the WRITE data is writ- 
ten directly to the non-failing storage unit at step 710. If 
the status is *0\ data can not be directly written to the 
non-failing unit, because such an operation would alter 
parity, making it impossible to later reconstruct the cor- 
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responding data in the failed unit. Accordingly, in the 
preferred embodiment, the controller will first recover 
the block of data in the same stripe on the failing unit. 
As shown if Fig. 7, the block of data in the failing unit is 

5 first reconstructed by Exclusive-ORing at step 711, and 
saved in the parity block location at step 712, following 
the steps described above. The WRITE data is then writ- 
ten to its storage unit at step 71 3, and the status map is 
updated at step 714. Note that if the parity block for the 

10 stripe containing the data to be written is on the failing 
unit, no reconstruction is necessary, since parity will be 
lost anyway. Therefore, the status for all blocks on this 
stripe is set to 1 when the storage unit failure is detected. 
The effect will be to cause data on this stripe to be di- 

15 rectly written to storage as if the corresponding block on 
the failing unit had already been recovered. For exam- 
ple, referring to Fig. 1, if storage unit 121 fails, the con- 
troller will immediately set the status of blocks 132-134 
to '1 \ so that WRITE operations to these blocks can pro- 

20 ceed directly. In an alternative embodiment, if the 
WRITE operation is to a non-failing unit, and the corre- 
sponding block on the failing unit has not been recov- 
ered, it would be possible to follow the same steps used 
for a normal mode WRITE operation to update the parity 

25 block, preserving the ability to reconstruct the failing 
unit's data later if a READ or WRITE of the data on the 
failed unit is requested. 

[0036] In the preferred embodiment, parity blocks are 
used to store reconstructed data, with the result that the 

30 system runs without parity protection after a single stor- 
age unit failure. An alternative embodiment is possible 
where a sufficiently large spare storage stripe or stripes 
is reserved on the storage units, as shown in Fig. 8. This 
spare storage stripe might contain temporary data which 

35 does not require parity protection and which can be 
overwritten if the need arises, or it might contain no data 
at all. In this alternative embodiment, reconstructed data 
is relocated to a block of a spare storage stripe 811-814 
instead of the parity block. This alternative is only pos- 

40 sible where sufficient spare storage exists to accommo- 
date the non-spare contents of the failed unit. It would 
also have the consequence of reducing the amount of 
temporary storage available to the system, possibly de- 
grading performance or reducing the number of users 

45 the system can service. In this alternative embodiment, 
normal mode READ and WRITE operations are per- 
formed in exactly the same manner as in the preferred 
embodiment. When operating in failure mode, the status 
map is checked, and the data reconstructed as needed, 

50 in the manner described above. However, instead of 
writing the reconstructed data to the parity block, it is 
written to a block in spare storage. Another field is re- 
quired in status map 106 to record the new location of 
the data which was contained on the failed unit. In ad- 

55 dition, with any WRITE operation parity is updated in the 
same manner as a WRITE operation in normal mode. 
This is done after any reconstruction of data on the failed 
unit. 
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[0037] In another alternative embodiment, parity pro- 
tection and mirroring are combined on the same system. 
Some of the data contained on the storage units is pro- 
tected by the parity protect ion mechanism described 
herein, while other data is mirrored. In the event of a 5 
storage unit failure, the parity protected data is recon- 
structed and stored as described above, while the mir- 
rored data is accessed from the storage unit containing 
the mirrored copy. 

[0038] Although a specific embodiment of the inven- 10 
tion has been disclosed along with certain alternatives, 
it will be recognized by those skilled in the art that addi- 
tional variations in form and detail may be made. In par- 
ticular, while the disclosed preferred embodiment em- 
ploys magnetic disk storage units, the invention is ap- '5 
plicable to other storage device technologies having 
erasable, read/write characteristics. 



Claims 



20 



contain a spare storage block, and said storing data 
step stores the reconstructed data in said spare 
storage block. A storage apparatus for a computer 
system, comprising: 

at least three data storage units; 

at least one set of storage blocks, each set 
comprising a plurality of data storage blocks for 
containing data and a parity protection storage 
block for containing the parity protection of the 
data stored in said data storage blocks, each 
of said storage blocks being contained on a re- 
spective data storage unit; 

means for reconstructing the data contained in 
one of said data storage blocks, while the data 
storage unit containing said block is failing, 
from the remaining storage blocks in the set; 
and 



1 . A method of operating a computer system having a 
set of storage blocks, said set comprising a plurality 
of data storage blocks for containing data and a par- 
ity protection storage block for containing the parity 25 
protection information of the data stored in said data 
storage blocks, each of said storage blocks being 
contained on a respective data storage unit, said 
method comprising the steps of: 

30 

reconstructing data contained in a storage 
block, while the data storage unit containing 
said storage block is failing, from the remaining 
storage blocks in the set; 

35 

characterized by 



means for storing said reconstructed data on 
one of said data storage units. 

6. A storage apparatus for a computer system, com- 
prising 

at least three data storage units; 

at least one set of storage blocks, each set 
comprising a plurality of data storage blocks for 
containing data and a parity protection storage 
block for containing the parity protection of the 
data stored in said data storage blocks, each 
of said storage blocks being contained on a re- 
spective data storage unit; 



storing data reconstructed by said reconstruct- 
ing step on one of said data storage units other 
than said failing data storage unit. 

2. The method of operating a computer system of 
claim 1 , wherein said reconstructing data step re- 
constructs data when attempts are made to access 
said data. 

3. The method of operating a computer system of 
claim 1 or 2, wherein said storing data step stores 
the reconstructed data in said parity protection stor- 
age block. 

4. The method of operating a computer system of 
claim 3,wherein said reconstructing data step re- 
constructs data when attempts are made to access 
said data. 

5. The method of operating a computer system of any 
one of claims 1-4, wherein said data storage units 



means for reconstructing the data contained in 
one of said data storage blocks, while the data 
storage unit containing said block is failing, 
from the remaining storage blocks in the set; 

characterized by 

45 means for storing said reconstructed data on 

one of said data storage units other than said 
failing data storage unit. 

7. The storage apparatus for the computer system of 
so claim 6, wherein said means for storing said recon- 
structed data stores said data in said parity protec- 
tion storage block. 

8. The storage apparatus for the computer system of 
55 claim 7, wherein said means for reconstructing the 

data comprises a storage controller, said storage 
controller comprising: 
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a programmable processor executing a storage 
management program; and 

a non-volatile random access memory. 

5 

9. The storage apparatus for the computer system of 
claim 7, wherein said data processing system com- 
prises at least two of said set of storage blocks; and 

wherein said parity protection blocks are dis- 10 
tributed among said data storage units in a 
round robin manner. 

10. The storage apparatus for the computer system of 
claim 7, wherein each of said data storage units is is 
a rotating magnetic disk drive storage unit. 

11. The storage apparatus for the computer system of 
anyone of claims 6-10, wherein each of said data 
storage units contains a spare storage block; and 20 

wherein said means for storing said recon- 
structed data stores said data in one of said 
spare storage blocks. 



Patentanspruche 

1. Ein Verfahren zum Betreiben eines Rechnersy- 
stems mit einem Satz Speicherblocken, wobei die- 30 
ser Satz eine Vielzahl von Datenspeicherblocken 
zum Aufnehmen der Daten und einen Paritats- 
schutz-Speicherblock zum Aufnehmen der Pari- 
tatsschutzinformationen der in den Datenspeicher- 
blocken gespeicherten Daten aufweist, wobei jeder 35 
der Speicherblocke auf einer entsprechenden Da- 
tenspeichereinheit enthalten ist, und das Verfahren 
folgende Schritte aufweist: 

Wiedergewinnen der in einem Speicherblock 40 
enthaltenen Daten bei Ausfall der Datenspei- 
chereinheit, die diesen Speicherblock enthalt, 
aus den restlichen Speicherblocken in dem be- 
treffenden Satz; 

45 

gekennzeichnet durch 

das Speichern der durch diesen Wiedergewin- 
nungsschritt rekonstruierten Daten auf einer 
dieser Datenspeichereinheiten, die nicht die so 
ausgefallene Datenspeichereinheit ist. 

2. Das Verfahren zum Betreiben eines Rechnersy- 
stems gemaB Anspruch 1 , in dem der Datenwieder- 
gewinnungsschritt die Daten rekonstruiert, wenn s$ 
Versuche gemacht werden, auf diese Daten Zugriff 

zu nehmen. 
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3. Das Verfahren zum Betreiben eines Rechnersy- 
stems gemaB Anspruch 1 oder 2, in dem der Da- 
tenwiedergewinnungsschritt die rekonstruierten 
Daten im Paritatsschutz-Speicherblock speichert. 

4. Das Verfahren zum Betreiben eines Rechnersy- 
stems gemaB Anspruch 3, in dem der Datenwieder- 
gewinnungsschritt die Daten rekonstruiert, wenn 
Versuche gemacht werden, auf diese Daten Zugriff 
zu nehmen. 

5. Das Verfahren zum Betreiben eines Rechnersy- 
stems gemaB einem beliebigen der Anspruche 1 -4, 
in dem die Datenspeichereinheiten einen Ersatz- 
speicherbiock enthalten, und der Datenspeicher- 
schritt die rekonstruierten Daten in dem Ersatzspei- 
cherblock speichert. Eine Speichervorrichtung fur 
ein Rechnersystem, enthaltend: 

wenigstens drei Datenspeichereinheiten; 

wenigstens einen Satz Speicherblocke, wobei 
jeder Satz eine Vielzahl von Datenspeicher- 
blocken zur Aufnahme von Daten sowie einen 
Paritatsschutz-Speicherblock zur Aufnahme 
des Paritatsschutzes der in den Datenblocken 
gespeicherten Daten, wobei jeder der Spei- 
cherblocke in einer entsprechenden Daten- 
speichereinheit enthalten ist; 

Mittel zum Rekonstruieren der in einem der Da- 
tenspeicherblocke enthaltenen Daten bei Aus- 
fatl der die Blocke enthaltenden Datenspei- 
chereinheit aus den restlichen Speicherblok- 
ken im Satz; und 

Mittel zum Speichern der rekonstruierten Daten 
auf einer der Datenspeichereinheiten. 

6. Eine Speichervorrichtung fur ein Rechnersystem, 
enthaltend: 

Wenigstens drei Datenspeichereinheiten; 

wenigstens einen Satz Speicherblocke, wobei 
jeder Satz eine Vielzahl von Datenspeicher- 
blocken zur Aufnahme von Daten sowie einen 
Paritatsschutz-Speicherblock zur Aufnahme 
des Paritatsschutzes der in den Datenblocken 
gespeicherten Daten, wobei jeder der Spei- 
cherblocke in einer entsprechenden Daten- 
speichereinheit enthalten ist; 

Mittel zum Rekonstruieren der in einem der Da- 
tenspeicherblocke enthaltenen Daten bei Aus- 
fall der die Blocke enthaltende Datenspeicher- 
einheit aus den restlichen Speicherblocken im 
Satz; 
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gekennzeichnet durch 

Mittel zum Speichern der rekonstruierten Daten 
auf einer der Datenspeichereinheiten, die nicht 
die ausgefallene Datenspeichereinheit ist. 

7. Die Speichervorrichtung fur ein Rechnersystem ge- 
maB Anspruch 6, in der das Mittel zum Speichern 
der rekonstruierten Daten diese Daten in den Prio- 
ritatsschutz-Speicherblock speicheit 



10 



respective, (edit proc6d6 comprenant les etapes 
consistant h : 

reconstituer les donnees contenues dans un 
bloc de memorisation, alors que I'unit6 de me- 
morisation de donnees contenant ledit bloc de 
memorisation est defaillante, d'aprfcs les blocs 
de memorisation restant dans I'ensemble, 

caract6ris6 par 



8. Die Speichervorrichtung fur ein Rechnersystem ge- 
maB Anspruch 7, in der das Mittel zum Rekonstru- 
ieren der Daten einen Speicher-Controller umfaBt, 
wobei dieser Speicher-Controller enthalt: 

einen programmierbaren Prozessor, der ein 
Speicherverwaltungsprogramm abarbeitet; 
und 

Einen nichtfluchtigen Direktzugriffsspeicher. 

9. Die Speichervorrichtung fur das Rechnersystem 
gemaB Anspruch 7, in der das Datenbearbeitungs- 
system mindestens zwei Speicherblocke des Spei- 
cherblocksatzes enthalt; und 

in dem die Paritatsschutzblocke unter den Da- 
tenspeichereinheiten in Zeitrasterfolge rundum 
verteilt sind. 

10. Die Speichervorrichtung fur ein Rechnersystem ge- 
maB Anspruch 7, in der jede der Datenspeicherein- 
heiten eine rotierende Magnetplattenlaufwerk- 
Speichereinheit ist. 

11 . Die Speichervorrichtung fur ein Rechnersystem ge- 
maB einem beliebigen der Anspruche 6-10, in dem 
jede der Datenspeichereinheiten einen Ersatzspei- 
cherblock enthalt; und 

in dem das Mittel zum Speichern der rekonstru- 
ierten Daten diese Daten in einem der Ersatz- 
speicherblocke speicheit 

Revendications 



des donnees de memorisation reconstitutes 
par ladite etape de reconstitution sur Tune des 
unites de memorisation de donnees autre que 
is (adite unite de memorisation de donn6es d6- 

faillante. 

2. Procede de mise en oeuvre cfun systeme d'ordina- 
teur selon la revendication 1 , dans lequel ladite eta- 

20 pe de reconstitution des donnees reconstitue des 
donnees lorsque des tentatives sont faites d'acce- 
der auxdites donnees. 

3. Precede de mise en oeuvre d'un systeme d'ordina- 
25 teur selon la revendication 1 ou 2, dans lequel ladite 

etape de memorisation des donnees memorise les 
donnees reconstitu6es dans ledit bloc de memori- 
sation de protection de pa rite. 



30 4. Precede de mise en oeuvre d'un systeme d'ordina- 
teur selon la revendication 3, dans lequel ladite eta- 
pe de reconstitution des donnees reconstitue les 
donnees lorsque des tentatives sont faites d'acc6- 
der auxdites donnees. 

35 

5. Procede de mise en oeuvre d'un systeme d'ordina- 
teur selon Tune quelconque des revendications 1 k 
4, dans lequel lesdites unites de memorisation de 
donnees contiennent un bloc de memorisation de 
40 secours, et ladite etape de memorisation de don- 
nees memorise les donnees reconstituees dans le- 
dit bloc de memorisation de secours. Dispositif de 
memorisation destine a un systeme d'ordinateur, 
comprenant : 

45 

au moins trois unites de memorisation de don- 
nees, 



75 



20 



1. Procede de mise en oeuvre d'un systeme d'ordina- 
teur comportant un ensemble de blocs de memori- so 
sation, ledit ensemble comprenant une pluralite de 
blocs de memorisation de donnees destines a con- 
tenir des donnees et un bloc de memorisation de 
protection de parite destine a contenir les informa- 
tions de protection de parite des donnees memori- 55 
sees dans lesdits blocs de memorisation de don- 
nees, chacun desdits blocs de memorisation etant 
contenu sur une unite de memorisation de donnees 



au moins un ensemble de blocs de memorisa- 
tion, chaque ensemble comprenant une plura- 
lite de blocs de memorisation de donnees des- 
tines & contenir des donnees et un bloc de me- 
morisation de protection de parite destine & 
contenir la protection de parite des donnees 
memoris6es dans lesdits blocs de memorisa- 
tion, chacun desdits blocs de memorisation 
etant contenu sur une unite de memorisation 
de donnees respective, 
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9. Dispositif de memorisation destine au systeme d'or- 
dinateur selon la revendication 7, dans lequel (edit 
systeme de traitement de donnees comprend au 
moins deux blocs dudit ensemble de blocs de me- 

5 morisation, 

dans lequel lesdits blocs de protection de parite 
sont repartis parmi lesdites unites de memori- 
sation de donnees suivant une sequence pe- 
10 riodique. 

1 0. Dispositif de memorisation destine au systeme d'or- 
dinateur selon la revendication 7, dans lequel cha- 
cune desdites unites de memorisation de donnees 

*5 est une unite de memorisation k lecteur de disque 
magnetique rotatif. 

1 1 . Dispositif de memorisation destine au systeme d'or- 
dinateur selon Tune quelconquedes revendications 

20 6 a 1 0, dans lequel chacune desdites unites de me- 
morisation de donnees contient un bloc de memo- 
risation de secours, et 



un moyen destine & reconstituer les donnees 
contenues dans Tun desdits blocs de memori- 
sation de donnees, alors que I'unite de memo- 
risation de donnees contenant ledit bloc est d6- 
faillante, d'apres les blocs de memorisation 
restant dans ('ensemble, et 

un moyen destine & memoriser lesdites don- 
nees reconstituees sur Tune desdites unites de 
memorisation de donnees. 

6. Dispositif de memorisation destine & un systeme 
d'ordinateur, comprenant 

au moins trots unites de memorisation de don- 
nees, 

au moins un ensemble de blocs de memorisa- 
tion, chaque ensemble comprenant une plura- 
lite de blocs de memorisation de donnees des- 
tines & contenir des donnees et un bloc de me- 
morisation de protection de parite destine k 
contenir la protection de parite des donnees 
memorisees dans lesdits blocs de memorisa- 
tion de donnees, chacun desdits blocs de me- 
morisation etant contenu sur une unite de me- 
morisation de donn6es respective, 

un moyen destine a reconstituer les donnees 
contenues dans Tun desdits blocs de memori- 
sation de donnees, alors que ('unite de memo- 
risation de donnees contenant ledrt bloc est de- 
faillante, d'apres les blocs de memorisation 
restant dans I'ensemble, et 

caracterise par 

un moyen destine a memoriser lesdites don- 
nees reconstituees sur Tune desdites unites de 
memorisation de donnees, autre que ladite uni- 
te de memorisation de donnees defaillante. 

7. Dispositif de memorisation destine au systeme d'or- 
dinateur selon la revendication 6, dans lequel ledit 
moyen destine a memoriser lesdites donnees re- 
constituees memorise lesdites donnees dans les- 
dits blocs de memorisation de protection de parite. 

8. Dispositif de memorisation destine au systeme d'or- 
dinateur selon la revendication 7, dans lequel ledit 
moyen destine h reconstituer les donnees com- 
prend un controleur de memorisation, ledit contro- 
leur de memorisation comprenant : 

un processeur programmable executant un 
programme de gestion de memorisation, et 



dans lequel ledit moyen destine a memoriser 
25 lesdites donnees reconstituees memorise les- 

dites donnees dans Pun desdits blocs de me- 
morisation de secours. 
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une memoire vive non volatile. 
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